# A tibble: 5 × 8
target stage_expressed gene_id_3d7 chrom coords_3d7_amplified seq_length
<chr> <chr> <chr> <chr> <chr> <dbl>
1 *ama1* blood PF3D7_1133400 11 1294312-1294613 300
2 *csp* liver PF3D7_0304600 03 221351-221640 288
3 *msp7* blood PF3D7_1335100 13 1419236-1419567 330
4 *sera2* blood PF3D7_0207900 02 320762-321022 259
5 *trap* liver PF3D7_1335900 13 1465058-1465379 320
# ℹ 2 more variables: gc_content <chr>, pf6k_variant_positions <chr>
Summary of pairwise distances
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 2.000 4.000 4.956 7.000 15.000
# A tibble: 3 × 6
dens raw_reads n_reads_dada2 n mean_all_raw mean_all_dada2
<fct> <dbl> <dbl> <int> <dbl> <dbl>
1 <1.5 g/µL 4289511 1253629 90 47661. 13929.
2 1.5-75 g/µL 7958631 2577615 90 88429. 28640.
3 ≥75 g/µL 10157092 3391101 84 120918. 40370.
Number of samples that returned usable reads
[1] 257
Per-sample read depth
Number of haplotype occurrences per target
Median number of reads per target
[1] 1542
Overview numbers
# A tibble: 3 × 6
hap_type n_type sum_reads tot pct_count pct_reads
<fct> <int> <dbl> <int> <dbl> <dbl>
1 Expected reference 859 6382389 1292 66.5 88.4
2 Systematic error 254 761330 1292 19.7 10.5
3 Random error 179 78626 1292 13.9 1.1
Sample-level overview
Proportion of reads and haplotype occurrences that are false positive
Summary statistics of false positives
# A tibble: 15 × 6
# Groups: target [5]
target hap_type n_type n_reads pct_type pct_reads
<chr> <fct> <int> <dbl> <dbl> <dbl>
1 *ama1* Expected reference 205 1799151 70.9 95.7
2 *ama1* Systematic error 30 51043 10.4 2.7
3 *ama1* Random error 54 30340 18.7 1.6
4 *csp* Expected reference 225 2313875 77.6 87.4
5 *csp* Systematic error 40 323921 13.8 12.2
6 *csp* Random error 25 10152 8.6 0.4
7 *msp7* Expected reference 201 541878 74.4 78.2
8 *msp7* Systematic error 53 145442 19.6 21
9 *msp7* Random error 16 5536 5.9 0.8
10 *sera2* Expected reference 175 1326908 65.8 87.7
11 *sera2* Systematic error 63 179454 23.7 11.9
12 *sera2* Random error 28 7102 10.5 0.5
13 *trap* Expected reference 53 400577 29.9 82.2
14 *trap* Systematic error 68 61470 38.4 12.6
15 *trap* Random error 56 25496 31.6 5.2
# A tibble: 3 × 6
# Groups: dens [3]
dens hap_type n_type n_reads pct_type pct_reads
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 <1.5 g/µL Random error 41 40011 16.1 3.2
2 1.5-75 g/µL Random error 64 19844 12.3 0.8
3 ≥75 g/µL Random error 74 18771 14.3 0.6
Number of reads supporting true positives vs false positives
# A tibble: 1 × 3
median_in median_not_in wilcox_p
<dbl> <dbl> <dbl>
1 2393 104 1.32e-70
Read depth of false positives
Characteristics of false positives
Sensitivity-specificity plots
Optimal thresholds table
# A tibble: 9 × 6
metric threshold ci025 median ci975 type
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 threshold 275 204. 275 420. depth
2 specificity 0.749 0.676 0.749 0.821 depth
3 sensitivity 0.950 0.917 0.952 0.972 depth
4 threshold 0.00713 0.00521 0.00808 0.0139 prop
5 specificity 0.525 0.458 0.559 0.682 prop
6 sensitivity 0.970 0.903 0.964 0.987 prop
7 threshold 0.208 0.0895 0.208 0.361 ratio
8 specificity 0.671 0.443 0.671 0.848 ratio
9 sensitivity 0.817 0.718 0.831 0.930 ratio
Number of haplotype occurrences and haplotypes pre- and post-censoring
[1] 1292
[1] 975
[1] 0.754644
[1] 124
[1] 59
[1] 0.4758065
Number of samples that returned no haplotypes post-censoring
[1] 3
Upset plot of how haplotypes are censored
Summary of censored and uncensored haplotypes
# A tibble: 6 × 5
# Groups: hap_type [3]
censored hap_type count tot pct
<chr> <fct> <int> <int> <dbl>
1 Censored Expected reference 67 859 8
2 Censored Systematic error 102 254 40
3 Censored Random error 148 179 83
4 Not censored Expected reference 792 859 92
5 Not censored Systematic error 152 254 60
6 Not censored Random error 31 179 17
Summary of how haplotypes were censored
# A tibble: 4 × 4
# Groups: name [4]
name n_fps n pct
<chr> <int> <int> <dbl>
1 lendiff 179 50 28
2 prop_lt_thresh 179 97 54
3 ratio_lt_thresh 179 53 30
4 reads_lt_thresh 179 134 75
Comparison of haplotypes that passed and did not pass censoring
Censoring results by density
Censoring of false positive haplotypes by density
# A tibble: 3 × 5
dens No Yes tot prop
<fct> <int> <int> <int> <dbl>
1 <1.5 11 13 24 0.458
2 1.5-75 6 39 45 0.133
3 ≥75 0 54 54 0
Fisher's Exact Test for Count Data
data: .
p-value = 1.964e-07
alternative hypothesis: two.sided
Censored true positives
Characteristics of censored true positives
# A tibble: 3 × 4
# Groups: dens [3]
dens tot n prop
<fct> <int> <int> <dbl>
1 <1.5 67 13 0.194
2 1.5-75 67 15 0.224
3 ≥75 67 39 0.582
# A tibble: 2 × 4
# Groups: ref_pct >= 10 [2]
`ref_pct >= 10` tot n prop
<lgl> <int> <int> <dbl>
1 FALSE 67 62 0.925
2 TRUE 67 5 0.0746
Percent agreement across replicates
# A tibble: 3 × 4
# Groups: n [3]
n tot nn prop
<int> <int> <int> <dbl>
1 1 416 99 0.238
2 2 416 75 0.180
3 3 416 242 0.582
# A tibble: 9 × 5
# Groups: dens, n [9]
dens n tot nn prop
<fct> <int> <int> <int> <dbl>
1 <1.5 g/µL 1 109 51 0.468
2 <1.5 g/µL 2 109 25 0.229
3 <1.5 g/µL 3 109 33 0.303
4 1.5-75 g/µL 1 163 32 0.196
5 1.5-75 g/µL 2 163 31 0.190
6 1.5-75 g/µL 3 163 100 0.613
7 ≥75 g/µL 1 144 16 0.111
8 ≥75 g/µL 2 144 19 0.132
9 ≥75 g/µL 3 144 109 0.757
Pairwise Jaccard distance of replicates
# A tibble: 1 × 2
median_jac iqr_jac
<dbl> <dbl>
1 0.833 0.5
# A tibble: 3 × 3
dens median_jac iqr_jac
<fct> <dbl> <dbl>
1 <1.5 0.5 0.75
2 1.5-75 0.833 0.5
3 ≥75 1 0.2
Number of expected haplotypes that were missing
# A tibble: 2 × 4
# Groups: found [2]
found tot n prop
<chr> <int> <int> <dbl>
1 No 1365 477 0.349
2 Yes 1365 888 0.651
Found and missing haplotypes by reference percent
Proportion missing by read depth
Correlation between reference and read proportion
Replicate missingness by reference proportion
Number of replicates in which haplotype was found
# A tibble: 12 × 5
# Groups: dens [3]
dens in_n_rep count tot pct
<fct> <int> <int> <int> <dbl>
1 ≥75 g/µL 0 30 166 18
2 ≥75 g/µL 1 5 166 3
3 ≥75 g/µL 2 14 166 8
4 ≥75 g/µL 3 117 166 70
5 1.5-75 g/µL 0 62 202 31
6 1.5-75 g/µL 1 20 202 10
7 1.5-75 g/µL 2 27 202 13
8 1.5-75 g/µL 3 93 202 46
9 <1.5 g/µL 0 78 158 49
10 <1.5 g/µL 1 34 158 22
11 <1.5 g/µL 2 21 158 13
12 <1.5 g/µL 3 25 158 16
Missingness risk factors
# A tibble: 9 × 4
feature term bivariate multivariate
<chr> <chr> <chr> <chr>
1 ref_prop ref_pct 0.98 (0.97-0.98); p=4.7e-11 0.96 (0.96-0.97); p=6e-…
2 target target*csp* 0.76 (0.49-1.18); p=0.23 0.9 (0.54-1.51); p=0.7
3 target target*msp7* 1.24 (0.81-1.89); p=0.32 0.4 (0.23-0.68); p=8e-04
4 target target*sera2* 1.68 (1.1-2.57); p=0.016 1.05 (0.63-1.77); p=0.8
5 target target*trap* 21.37 (13.02-35.08); p=1e-33 6.13 (3.13-12.03); p=1e…
6 density dens1.5-75 g/µL 1.62 (0.76-3.45); p=0.21 1.47 (0.75-2.88); p=0.3
7 density dens<1.5 g/µL 6.27 (2.87-13.67); p=3.9e-06 3.88 (1.82-8.27); p=5e-…
8 reads reads_10000 0.57 (0.53-0.62); p=5.7e-40 0.61 (0.54-0.69); p=3e-…
9 moi expected_moi 0.32 (0.24-0.44); p=1.2e-12 1.08 (0.91-1.27); p=0.4
Observed vs expected MOI
# A tibble: 3 × 4
# Groups: obs_min_exp_moi_cat [3]
obs_min_exp_moi_cat tot n pct
<chr> <int> <int> <dbl>
1 Higher than expected 254 26 10
2 Lower than expected 254 154 61
3 Same as expected 254 74 29
# A tibble: 1 × 4
median_low median_mid median_high wilcox_p
<dbl> <dbl> <dbl> <dbl>
1 -4 -1 -1 0.00000000221
Clinical samples
# A tibble: 2 × 4
# Groups: censored [2]
censored tot n pct
<lgl> <int> <int> <dbl>
1 FALSE 142 106 75
2 TRUE 142 36 25